0368 - 3248 - 01 - Algorithms in Data Mining Fall 2013 Lecture 3 : Item frequency estimation in streams
نویسنده
چکیده
Say we are given a stream of elements X = [x1, . . . , xN ] where xi ∈ {a1, . . . , an}. Let ni denote the number of times element ai appeared in the stream, i.e., fi = |{j|xj = ai}|. Our goal is to estimate fi for all frequent elements. This can be solved exactly by keeping a counter for each element {a1, . . . , an}. Alas, this might require, Θ(n) memory. Here we look for methods to approximate the values on fi using o(n) memory.
منابع مشابه
0368 - 3248 - 01 - Algorithms in Data Mining Fall 2013 Lecture 4 : Frequency Moment Estimation in Streams
Estimating f0 Here we describe an algorithm for estimating f0 which merges (and hopefully simplifies) ideas from [1] and [2]. First, assume a hash function h : a→ [0, 1] uniformly. Let us define a random variable X = minih(ai). Intuitively, X should be roughly 1/m and therefore 1/X should be a fair estimate of m. This is almost true. In what comes next we make this into an exact statement. Let ...
متن کامل0368 - 3248 - 01 - Algorithms in Data Mining Fall 2013 Lecture 4 : Home Assignment , Due Dec 3 rd
Warning: This note may contain typos and other inaccuracies which are usually discussed during class. Please do not cite this note as a reliable source. If you find mistakes, please inform me. 1 Probabilistic inequalities setup In this question you will be asked to derive the three most used probabilistic inequalities for a specific random variable. Let x 1 ,. .. , x n be independent {−1, 1} va...
متن کامل0368 - 3248 - 01 - Algorithms in Data Mining Fall 2013 Lecture 5 : Random - projection
Before we pick this distribution and show that Equation 1 holds for it, let us first see that this gives the opening statement. Consider a set of n points x1, . . . , xn in Euclidian space R. Embedding these points into a lower dimension while preserving all distances between them up to distortion 1±ε means approximately preserving the norms of all ( n 2 ) vectors xi − xj . Assuming Equation 1 ...
متن کاملMining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding Window Techniques
As we know that the online mining of streaming data is one of the most important issues in data mining. In this paper, we proposed an efficient one.frequent item sets over a transaction-sensitive sliding window), to mine the set of all frequent item sets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorit...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کامل